June 29, 2016

Probationer Population

  • Mostly male
  • Mostly not murderers (<1%), but dangerous
  • Over-crowded jails = early release from state prisons

LA County

Where did this project start?

Berk's study (Variable Importance Plot)

Feature selection

##                 freqRatio percentUnique zeroVar   nzv
## Murder         129.421875    0.01198035   FALSE  TRUE
## Age              1.053819    0.38936145   FALSE FALSE
## White            4.662822    0.01198035   FALSE FALSE
## Male             8.717113    0.01198035   FALSE FALSE
## ZIP              1.273859    3.73187972   FALSE FALSE
## Total_Pop        1.273859    3.54618426   FALSE FALSE
## Black_Pop        1.273859    3.28860669   FALSE FALSE
## Prop_Black       1.273859    3.42638074   FALSE FALSE
## Income           1.273859    3.51623338   FALSE FALSE
## PRIMARY CHARGE   1.296763    2.59374626   FALSE FALSE
## Gang             1.883745    0.01198035   FALSE FALSE
## RegisterSO      50.684211    0.01198035   FALSE  TRUE
## ViolentCase     11.704718    0.01198035   FALSE FALSE
## WeaponCase     104.658228    0.01198035   FALSE  TRUE
## DrugCase       537.516129    0.01198035   FALSE  TRUE
## MH               3.310354    0.01198035   FALSE FALSE
## Zip_Present      7.159335    0.01198035   FALSE FALSE

Model 1

fit <- randomForest(Murder ~ Age + White + Male + Total_Pop + 
                        Black_Pop + Prop_Black + Income + 
                        Zip_Present + Gang + ViolentCase, 
                    data = train, 
                    importance = TRUE, 
                    ntree = 1500)

Model 1 ROC

Model 1 Variable Importance

Model 2

fit2 <- randomForest(Murder ~ Age + Total_Pop + Black_Pop + 
                         Prop_Black + Income + Zip_Present + 
                         ViolentCase, 
                    data = train, 
                    importance = TRUE, 
                    ntree = 1500,
                    mtry = 2,
                    cutoff = c(0.65, 0.30),
                    sampsize = c("0" = 100, "1" = 34),
                    strata = as.factor(train$Murder),
                    keep.inbag = TRUE,
                    na.action = na.roughfix)

Model 2 ROC

Model 2 Variable Importance

Model 2 Confusion Matrix

##    
##        0    1
##   0 5898   14
##   1  707   37
##          Sensitivity          Specificity       Pos Pred Value 
##          0.725490196          0.892959879          0.049731183 
##       Neg Pred Value           Prevalence       Detection Rate 
##          0.997631935          0.007662260          0.005558894 
## Detection Prevalence    Balanced Accuracy 
##          0.111778846          0.809225037

Ongoing Evaluation

  • Context, context, context
  • False negatives are to be avoided
  • Comparison to logistic regression and LS/CMI

Implementation

  • 2,300 early releases

Algorithms in the news